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Abstract 

We study the connection between the order of phase transitions in combina¬ 
torial problems and the complexity of decision algorithms for such problems. We 
rigorously show that, for a class of random constraint satisfaction problems, a lim¬ 
ited connection between the two phenomena indeed exists. Specifically, we extend 
the definition of the spine order parameter of Bollobas et al. (7) to random con¬ 
straint satisfaction problems, rigorously showing that for such problems a discon¬ 
tinuity of the spine is associated with a 2^^"^ resolution complexity (and thus a 
complexity of DPLL algorithms) on random instances. The two phenomena 
have a common underlying cause: the emergence of “large” (linear size) minimally 
unsatisfiable subformulas of a random formula at the satisfiability phase transition. 

We present several further results that add weight to the intuition that random 
constraint satisfaction problems with a sharp threshold and a continuous spine are 
“qualitatively similar to random 2-SAT”. Finally, we argue that it is the spine rather 
than the backbone parameter whose continuity has implications for the decision 
complexity of combinatorial problems, and we provide experimental evidence that 
the two parameters can behave in a different manner. 

Keywords: constraint satisfaction problems, phase transitions, spine, resolution 
complexity. 
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1 Introduction 

The major promise of phase transitions in combinatorial problems has been to shed 
light on the “practical” algorithmic complexity of combinatorial problems. A possible 
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connection has been highlighted by results of Monasson et al. tnm that are based on 
experimental evidence and nonrigorous arguments from statistical mechanics. Study¬ 
ing a version of random satisfiability that “interpolates” between 2-SAT and 3-SAT, 
they suggested that the order of the phase transition, combinatorially expressed by 
continuity of an order parameter called the backbone, might have implications for the 
problem’s typical-case complexity. A discontinuous or first-order transition appeared 
to be symptomatic of exponential complexity, whereas a continuous or second-order 
transition was correlated with polynomial complexity. 

It is understood by now that this connection is limited. For instance, fc-XOR-SAT 
is a problem believed, based on arguments from statistical mechanics @, to have a 
first-order phase transition. But it is easily solved by a polynomial algorithm, Gaussian 
elimination. So, if any connection exists between first-order phase transitions and the 
complexity of a given problem, it cannot involve all polynomial time algorithms for the 
problem. Fortunately, this does not end all hopes for a connection with computational 
complexity: descriptive complexity provides a principled way to measure the com¬ 
plexity of problems with respect to more limited classes of algorithms, those express¬ 
ible in a given framework. Here we focus on the Davis-Putnam-Longman-Loveland 
(DPLL) class of algorithms Q. 

One way to identify the connection between phase transitions and computational 
complexity is to formalize the underlying intuition connecting the two notions in a 
purely combinatorial way, devoid of any physics considerations. First-order phase 
transitions amount to a discontinuity in the (suitably rescaled) size of the backbone. 
For random /c-SAT O, and more specifically for the optimization problem MAX-fc- 
SAT, the backbone has a combinatorial interpretation: it is the set of literals that are 
“frozen”, or assume the same value, in all optimal assignments. Intuitively, a large 
backbone size has implications for the complexity of finding such assignments: all lit¬ 
erals in the backbone require specific values in order to satisfy the formula optimally, 
but an algorithm assigning variables in an iterative fashion has very few ways to know 
what those “right” values to assign are. In the case of a first-order phase transition, 
the backbone of formulas just above the transition contains, with high probability, a 
fraction of the literals that is bounded away from zero. An algorithm such as DPLL 
that assigns values to variables iteratively may misassign a backbone variable whose 
height, in a binary tree characterizing the behavior of the algorithm, is H(n) where n 
is the number of variables. This would force a backtrack on the tree. Assuming the 
algorithm cannot significantly “reduce” the size of the explored portion of this tree, a 
first-order phase transition would then w.h.p. imply a lower bound for the running 
time of DPLL on random instances located slightly above the transition. 

There exists, however, a significant flaw in the heuristic argument above: the back¬ 
bone is defined with respect to optimal assignments for the given formula, meaning 
assignments that satisfy the largest possible number of clauses (or all of them, in the 
case where the formula is satisfiable). The argument suggests that a discontinuity in the 
backbone size will make it difficult for algorithms that assign variables in an iterative 
manner to find optimal solutions. The complexity of the optimization problem is, how¬ 
ever, often different from that of the corresponding decision problem. For instance, that 
is the case in XOR-SAT, where the decision problem is easy but the optimization prob¬ 
lem is hard. As mentioned above, XOR-SAT is presumed to have a first-order phase 
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transition, so it is not clear at all that the continuity or discontinuity of the backbone 
should be the relevant predictor for the complexity of the decision problem as well. 

Fortunately, it turns out that the intuition of the previous argument also holds for a 
different order parameter, a “weaker” version of the backbone called the spine, intro¬ 
duced in Q in order to prove that random 2-SAT has a second-order phase transition. 
Unlike the backbone, the spine is dehned in terms of the decision problem, hence it 
could conceivably have a larger impact on the complexity of these problems. Of course, 
the same caveat applies as for the backbone; any connection with computational com¬ 
plexity can only involve complexity classes that have weaker expressive power than the 
class of polynomial time algorithms. 

We aim in this paper to provide evidence that for random constraint satisfaction 
problems it is the behavior of the spine, rather than the backbone, impacts the com¬ 
plexity of the underlying decision problem. To accomplish this: 

1. We discuss the proper dehnitions of the backbone and spine for random con¬ 
straint satisfaction problems (CSP). 

2. We formally establish a simple connection between a discontinuity in the rela¬ 
tive size of the spine at the threshold and the resolution complexity of random 
satishability problems. In a nutshell, a necessary and sufficient condition for 
the existence of a discontinuity is the existence of an U(n) lower bound (w.h.p.) 
on the size of minimally unsatishable subformulas of a random (unsatishable) 
subformula. But standard methods from proof complexity |8| imply that for all 
problems where we can prove such an U(n) lower bound, there is a 2^^") lower 
bound on their resolution complexity and hence on the complexity of DPLL al¬ 
gorithms as well 0. This property arises from the expansion of the underlying 
formula’s hypergraph, and is independent of the precise dehnition of the problem 
at hand. Conversely we show (Theorem[0 that for any generalized satisfiability 
problem, a second-order phase transition implies, in the region where most for¬ 
mulas are unsatishable, an upper bound on resolution complexity that is smaller 
than any exponential: 0(2*^”) for every e > 0. 

3. We give a sufficient condition (Theorem|2j for the existence of a discontinuous 
jump in the size of the spine. We then show (Theorem^ that this condition is 
fulhlled by all problems whose constraints have no implicates of size two or less. 
Qualitatively, our results suggest that all satishability problems with a continuous 
phase transition in the spine are “2-SAT-like”. 

4. Finally, we present experimental results that attempt to clarify whether the back¬ 
bone and the spine can behave differently at the phase transition. The graph 
bipartition problem (GBP) is one case where this seems to happen. In contrast, 
for random 3-coloring (3-COL), the backbone and spine appear to have similar 
behavior. 

A note on the signihcance of our results: a hrst-order transition or discontinuity 
in the size of the spine is weaker than a discontinuity in the size of the backbone. In 
the last section of the paper we give a numerical demonstration of an example where 
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the backbone and spine behave differently. And unlike for the backbone, we do not 
have a physical interpretation for the spine. But this is not our intention. The argu¬ 
ment connecting the continuity of the backbone order parameter with the complexity 
of decision problems is problematic, and what we rigorously show is that — with no 
physical considerations in mind — the intuitive connection holds instead for the spine. 


2 Preliminaries 


Throughout this paper we assume a general familiarity with the concepts of phase 
transitions in combinatorial problems (3, random structures Go), and proof complex¬ 
ity Ol- We assume more detailed familiarity with certain fundamental results on sharp 
thresholds 11211131 1^. and we make use of some of the methods associated with those 
results. 

Two models arising in the theory of random structures are: 

• The constant probability model T{n,p). A random string of bits X G T{n,p) is 
obtained by independently setting each bit of AT to 1 with probability p, and the 
rest to 0. 


• The counting model r{n,m). A random string X G r{n,m) is obtained by 
setting m bits of X, chosen uniformly at random, to 1 and the rest to 0. 

For the following purposes, let us work within the constant probability model. Con¬ 
sider a property A that is monotonically increasing, in that if A holds for a given string 
of bits X, then changing any of these bits from 0 to 1 preserves property A. For any e > 

0,letpe = pe{n) be the canonical probability such that Probx6r(n.p5(n))[-^ satisfies A] 

e, where p^ increases monotonically with e. Sharp thresholds are those for which the 
function has a “sudden jump” from value 0 to 1: 


Definition 1 Property A has a sharp threshold iff for every 0 < e < 1/2, we have 
lim^^oo = 0. A has a coarse threshold if for some e > 0 it holds that 

liminf„_oo > 0. 

pi/2{n) 

We will use the model of random constraint satisfaction from Molloy O: 


Definition 2 Let V = {0,1,... — 1}, t > 2 be a fixed set. Consider all 2* — 1 

possible nonempty sets ofk-ary constraint templates (relations) with values taken from 
V. Let C be such a nonempty set of constraint templates. 

A random formula f G CSP(C) is a set of constraints specified under the counting 
model by the following procedure: 


1. Select, uniformly at random and with replacement, m hyperedges of the complete 
k-uniform hypergraph on n variables. 

2. For each hyperedge, choose a random ordering of the variables involved in it. 
Choose a random constraint template from C and apply it to the list of (ordered) 
variables. 
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We use the notation SAT(C) (instead o/CSP(C)j when t=2. 

For an instance $ £ CSP(C), we denote by Var($) the set of variables that actu¬ 
ally appear in $, and by opt{^) the number of constraints left unsatisfied by an optimal 
assignment for $. 

Just as in random graphs cni, under fairly liberal conditions one can use the con¬ 
stant probability model instead of the counting model from the previous definition. The 
interesting range of the parameter m is when the ratio m/n is a constant, c, called the 
constraint density. The original investigation of the order of the phase transition in 
fc-S AT used an order parameter called the backbone, 

B{^) = {x € Var{^) \ 3W G {x,x} : opt{^ U W) > opf($)}, (1) 

or more precisely the backbone fraction 


/bW = 


\Bm 

n 


( 2 ) 


Bollobas et al. |7| have investigated the order of the phase transition in fc-SAT (for 
k = 2) under a different order parameter, a “monotonic version” of the backbone called 
the spine 


S'($) = {xe V'ar($) |3JPG{a;,x},SC$:SeSAT,SAJPeSAT}. (3) 


Here, “£ SAT” means “is satisfiable” and “£ SAT” means “is unsatisfiable”. 
The corresponding version of Eq. 0 is 


/s($) 


Mill 

n 


(4) 


They showed that random 2-S AT has a continuous (second-order) phase transition: 
the size of fs approaches zero w.h.p. (as n —> oo) for constraint density c < C2-sat = 
1, and is continuous at c = C 2 -sat- By contrast, nonrigorous arguments from sta¬ 
tistical mechanics |6l imply that for 3-SAT the parameter /b jumps discontinuously 
from zero to a positive value at the transition point c = C 3 _sat (a first-order phase 
transition). 


3 How to define the backbone/spine for random CSP 
(and beyond) 

We would like to extend the concepts of backbone and spine to general constraint 
satisfaction problems. The extended definitions must preserve as many of the properties 
of the backbone/spine as possible. 

Certain differences between the case of random /c-SAT and the general case force 
us to employ an alternative definition of the backbone/spine. The most obvious is that 
Eq.0 involves negations of variables, unlike Molloy’s model. Also, these definitions 
are inadequate for problems whose solution space presents a relabeling symmetry, such 
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as the case of graph coloring where the set of (optimal) colorings is closed under 
permutations of the colors. Due to this symmetry, no variable can be “frozen” to a 
fixed value A as in Eq. Q. 

We therefore define the backbone/spine of a random instance of CSP(C) in a 
slightly different manner. Let C be the set of constraints obtained by applying the 
constraint templates in C to all ordered lists of k variables chosen from the set of all n 
variables. 

Definition 3 

B{^) = {x S Var($) \ 3C G C : x G C, U C) > opt{^)}, 

5'($) = {x G yar($) |3 C'gC',SC$:xGC',Sg CSP, S U C G CSP}. 

For fc-CNF formulas whose (original) backbone/spine contains at least three liter¬ 
als, a variable x is in the (new version of the) backbone/spine if and only if either x or 
X were present in the old version. In particular the new definition does not change the 
order of the phase transition of random fc-SAT. 

Alternatively, in studying 3-colorability (3-COL) of random graphs G = {V,E), 
Culberson and Gent m defined the spine of a colorable graph G to be the set of vertex 
pairs (x, y) G that get assigned the same color in all colorings of G. 

Following up on the idea of defining the backbone and spine in terms of constraints 
rather than variables, and by analogy with the definition in JT), one can extend the 
definition of S'(G) to general graphs^ by 


S{G) = {(x, y) G I 3 FT C G : FT G 3-COL, H U (x, y) G 3-COL}. (5) 

We can further extend these definitions to all random constraint satisfaction prob¬ 
lems CSP(C): 

Definition 4 


Sc(4>) = {G G G I opt{^ U G) > opf($)}, 
S'c(«>) = {G G G I 3 5 C $ : S G CSP, S U G G ^}. 


Similarly, one can define the backbone/spine fraction by 




Bern 

|C| 


and 


fSc{<^) = 


|G| 


^Culberson and Gent employ an “effective” version of the spine they call frozen development that is more 
amenable to experimental analysis. Frozen development is a subset of the spine, as defined in Eq. 
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We will refer to these concepts as the conifrainf-haiec/backbone/spine (fractions), 
as opposed to the previously defined variable-based quantities. The two are clearly 
related. For instance one can easily show that 

B{^) = UcgBc($) Var{C), 

where Var{C) represents all variables appearing in constraint C alone. It is also clear 
that |i3c(‘i’)| = and similarly for the spine. Since \C\ = Q(n^), it follows 

that the continuity of fs or fs implies the continuity of fsc or fsc ■ However, the con¬ 
verse is not in general true, and so the two backbone/spine fractions do not necessarily 
behave in the same way. 

Given the two types of dehnitions, which should we choose? The answer depends 
on the problem, as well as on the issue we wish to address. For instance, in the statisti¬ 
cal mechanics analysis of combinatorial problems, the presumably “correct” dehnition 
of the backbone emerges from the analysis undertaken in for random fc-SAT. But 
since we are interested in a combinatorial definition, with no physics considerations 
in mind, the only principled way to choose between the two types of order parameters 
(one based on variables, the other based on constraints) is based on the class of algo¬ 
rithms we are concerned with. In the case of random constraint satisfaction problems 
and DPLL algorithms, it is variables that get assigned values, so Dehnition 0 is pre¬ 
ferred. On the other hand, constraint-based dehnitions can make sense for problems 
that share some characteristics with random 3-COL (i.e., binary constraint satisfaction 
problems, and problems with built-in symmetries of the solution space). In a later sec¬ 
tion we will see an example, the case of graph bipartition, where the constraint-based 
backbone and spine seem to behave differently. (Whether one can come with a natural 
example of this phenomenon for the variable-based backbone is an interesting open 
problem.) 

4 Spine discontinuity and resolution complexity of ran¬ 
dom CSP 

In this section we will study the continuity of the spine-based order parameter fs for 
boolean random constraint satisfaction, or satishability, problems. The kind of continu¬ 
ous/discontinuous behavior we are looking for is formalized by the following dehnition 
(a similar one can be given for the constraint-based versions of the order parameter): 

Definition 5 Let C be such that SAT(C) has a sharp threshold. Problem SAT(C) has 
a discontinuous spine if there exists rj > 0 such that for every sequence m = m(n) we 
have 

lim Prob [$ G SAT] = 0 lim Prob [/s(‘h) > p] = 1- (6) 

n—^co m—m{n) n-t-oo m=m(n) 

If, on the other hand, for every e > 0 there exists a constant c = c(e) such that the map 
e —>■ c(e) is monotonically increasing and 

lim Prob [<I> S SAT] = 0 and lim Prob [/s($) > e] = 0 (7) 

n-t-oo m—c(e)n n^oo m—c(e)n 

we say that SAT(C) has a continuous spine. 
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We now give a simple observation that will be the basis for identifying discontinu¬ 
ities of the spine; 

Proposition 1 Let ^ be a minimally unsatisfiable formula, and let x be a literal that 
appears in <i>. Then, by Definition^ x S 

Proof. There exists C S <i> such that x G C. But <1) \ (7 is satisfiable and ($ \ C) U C 
is not, thus X e S'($). □ 


Corollary 1 k-SAT, fc > 3 has a discontinuous spine. 

Proof. To show a discontinuous spine it is sufficient to show that a random unsatis¬ 
fiable formula contains w.h.p. a minimally unsatisfiable subformula involving a linear 
number of literals. In the Chvatal-Szemeredi proof US that w.h.p. random /c-SAT has 
exponential resolution size for k > 3, the claim is implicitly proved. □ 


Definition 6 The width of a resolution proof P of the unsatisfiability of a CNF-formula 
F is defined to be the maximum number of literals in any clause that appears in the 
proof P. 

If ^ is an instance o/SAT(C), denote by Cl{^) the CNF formula obtained by 
expressing each constraint of ^ as a conjunction o/clauses (i.e., expressing <!> in con¬ 
junctive form). 

The resolution complexity of an instance <i> o/SAT(C) is defined as the length of 
the smallest resolution proof of Cl{^). 

A simple observation is that a continuous spine has implications for resolution com¬ 
plexity: 

Theorem 1 Let C be a set of constraint templates such that SAT(C) has a continuous 
spine. Then for every constraint density c > lime^Qc(e), and every e > 0, random 
formulas of constraint density c have w.h.p. resolution complexity 0(2*^"). 

Proof. 

Because of Proposition ^ and the fact that SAT(C) has a continuous spine, for ev¬ 
ery e > 0, minimally unsatisfiable subformulas of a random formula $ with constraint 
density c(e) contain w.h.p. at most en variables. Consider the backtrack tree of the nat¬ 
ural DPLL algorithm that tries to satisfy constraints one at a time on such a minimally 
unsatisfiable subformula F. By the usual correspondence between DPLL refutations 
and resolution complexity (e.g., O) this yields a resolution proof of the unsatisfiability 
of $ having size at most 2*^". 

Taking e to be small enough that c(e) < c, and using the fact that resolution com¬ 
plexity of a random formula is a monotonically decreasing function of the constraint 
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density, we get the desired result. 


□ 


Let us observe that we have stated the preceding theorem using condition c > 
lim^^oc{e) since we cannot be sure, even for fc-SAT, that the phase transition takes 
place at a constant value of the constraint density c. In practice one would of course 
expect that, for a problem with a continuous spine, there exists a sequence c(e) as in 
Dehnition|3having the constraint density at the phase transition as its limit. 

Definition 7 Denote by |L"| the number of constraints that appear in formula F. Define 

The next result gives a sufficient condition for a generalized satishability problem 
to have a discontinuous spine. Interestingly, it is one condition studied in M- 

Theorem 2 Let C be such that SAT(C) has a sharp threshold. If there exists e > 0 
such that for every minimally unsatisfiable formula F it holds that c*{F) > then 
SAT(C) has a discontinuous spine. 

Proof. The proof is similary to that of Corollary^ we will show that w.h.p. a random 
formula contains a minimally unsatisfiable subformula containing a linear number of 
variables, and apply Proposition^ 

To accomplish that, we first recall the following concept from 1131 : 

Definition 8 Let x,y > 0. A k-uniform hypergraph with n vertices is (a:,j/)-sparse if 
every set of s < xn vertices contains at most ys edges. 

We also recall Lemma 1 from the same paper. 

Lemma 1 Let k,c > 0 and y > 1/ fk — 1). Then w.h.p. the random k-uniform hyper¬ 
graph with n vertices and cn edges is (x, y)-sparse, where 



Let y = Directly applying Lemma^ w.h.p. a random fc-uniform hypergraph 

with cn edges is {xo,y) sparse, for xq = (^(^)^)'- The critical observation is 
then that the existence of a minimally unsatisfiable formula with xn variables and with 
c*{F) > implies that the A:-uniform hypergraph associated with the given formula 
is not (x, y)-sparse. It follows that any formula with fewer than x^n/k constraints (and 
thus fewer than xon variables) is satishable. Therefore, any minimally unsatishable 
subformula of random formula T* has more than xgn/k constraints. 

To show that such formulas have many variables, we again employ the expansion of 
the formula hypergraph given by Lemma^ and infer that all subformulas of size less 
than xn of $ (in particular those that are also subformulas of a minimally unsatishable 
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subformula of <!)) have a linear number of variables. 


□ 


One can give an explicitly defined class of satisfiability problems for which the 
previous result applies; 

Theorems Let k > 2 and let C be such that SAT(C) has a sharp threshold. If no 
clause template C G C has (when expressed as a CNF-fonnula) an implicate of length 
2 or 1 then 

1. For every minimally unsatisfiable formula F, c*{F) > 2 k-^ - Therefore SAT(C) 
satisfies the conditions of the previous theorem, i.e., it has a discontinuous spine. 

2. Moreover, there exists a constant rj > 0 such that w.h.p. random instances of 
SAT(C) have 0(2''") resolution complexity^. 

The condition in the theorem is violated, as expected, by random 2-SAT. It is also 
violated by the random version of the NP-complete problem l-in-fc-SAT. This can be 
seen as follows. The problem can be represented as CSP(C), for C a set of 2^ con¬ 
straints corresponding to all ways to negate some of the variables, and has a rigorously 
determined “2-SAT-like” location of the transition point HU. However, the formula 

C{xi,X2, . . .,Xk-l,Xk) A C{x^,Xk+l,- ■ .,X2k-2,Xi) 

f\C(x\, X 2 k — 1 ? ■ • ■ 7 X^k — 3 ^ Xk') F C(xk^ X^k— 2 ’: • ■ ■ j X^k—A: Xi'), 

where C is the constraint “1-in-fc”, is minimally unsatisfiable but has clause/variable 
ratio l/(fc — 1) and implicates aTf V TT and xi \/ Xk. 

Proof. 

1. For any real r > 1, formula F and set of clauses G C F, define the r-deficiency 
ofG, Sr(G) = r\G\ — 2| Var(G')|. Also define 

S*{F) = max{SriG) : 0 ^ G C F} (9) 

Definition 9 Let F be a formula, and let Ci,G 2 ,. ■. ,Ci,..., Cm be a listing of 
the constraints in F. 

A variable v is private/or constraint Ci if v appears in Ci but in no other con¬ 
straint. 

Variable v is free in Ci ifv appears in Ci but in no Cj, j < i. Otherwise we say 
that V is bound in Ci. 

^This result subsumes some of the results in [17|. While a preliminary version of this paper was under 
consideration (and publicly available |2^) related and technically more sophisticated results have been given 
independently in (Si- 
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We claim that for any minimally unsatisfiable F, i52fe-3(^) — Indeed, assume 
this was not true. Then there exists such F such that; 

S 2 k- 3 iG) < -1 for all 0 ^ G C F. (10) 


Lemma 2 Let F be a formula for which condition U (A holds. Then there exists an 
ordering Gi,..., G|^| of constraints in F such that each constraint Ci contains 
at least k — 2 variables that are free in Ci. 

Proof. Denote by Vi the number of variables that appear in exactly i constraints 
of F. We have X]i>i = ^1^1’ therefore 2| Var[F) \ — vi < k\F\. This can 
be rewritten as vi > 2| Var{F)\ — k\F\ > |F|(2fc — 3 — fc) = (fc — 3)|F|, where 
we use Eq. US). Therefore there exists at least one constraint G in F with at 
least k — 2 variables that are private in F, hence necessarily free in F. We set 
G|ir| = G and apply this argument recursively to F \ G. □ 


Let us show now that F cannot be minimally unsatishable. Construct a satisfying 
assignment for F incrementally, so that the partial assignment constructed up to 
stage j will satisfy constraints Ci,... ,Cj. 

Indeed, suppose we have constructed a partial assignment that satishes Gi,..., Cj-i, 
and consider now constraint Cj. At most two of the variables in Cj are bound in 
Cj. Since Cj has no implicates of length two or less, no matter what the assign¬ 
ment to these two variables might have been in the previous stages, one can set 
the variables that are free in Cj in a way that satishes this clause. Iteratively per¬ 
forming this construction yields a satisfying assignment for F, in contradiction 
with our assumption that F was minimally unsatishable. 

Therefore 62 i^_^{F) > 0, a statement equivalent to our conclusion. 

2. To prove the resolution complexity lower bound we use the size-width connec¬ 
tion for resolution complexity obtained in (H it is sufficient to prove that there 
exists 7 ] > 0 such that w.h.p. random instances of SAT(C) having constraint 
density c have resolution width at least rjn. 

To accomplish this, we use the same strategy as in M- dehne for a unsatishable 
formula $ a measure /r : Clauses N (where Clauses is the set of all possible 
disjunctions of literals from l/ar($), including the contradictory clause □) such 
that 

(a) for every clause G that appears in G(($), /i(G) < 1, 

(b) w.h.p. /J.(n) is “large”. 

(c) Infer that in any refutation there exists a clause G with “medium” /i(G), 
and 

(d) prove that if ^(G) is “medium” than the width of G is “large”. 
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As in 0, define 


fi{C) = min{|S| : ^ C |= C}, 

where |= is the logical entailment relation. In particular /r(n) is the size of 
the smallest unsatisfiable subformula of $. ^ is subadditive, that is, for every 
clauses Ci and C 2 that share a variable x appearing with opposite signs in the 
two clauses, 

p(reSa,(C'i, Ca)) < + ^(^2)- 

where resx (Ci, C 2 ) denotes the clause obtained by applying resolution to clauses 
Cl, C 2 with respect to variable x. It is clear that condition a) is satisfied. As to 
b), the following is true: 

Lemma 3 There exists pi > 0 such that for any c > 0, w.h.p. /r(n) > rjin, 
where $ is a random instance o/SAT(C) having constraint density c. 

Proof. In the proof of Theorem |2] we have shown that there exists 770 > 0 
such that w.h.p. any unsatisfiable subformula of a given formula has at least 
rjon constraints. Therefore any formula F made up of clauses from the CNF- 
representation of constraints in <I>, and which has fewer than rjon clauses is sat- 
isfiable (since it is less tight than the conjunction of those constraints). 

The claim now follows by taking rji = tjq. □ 


The only (slightly) nontrivial step of the proof, which critically uses the fact that 
constraints in C do not have implicates of length one or two, is to prove that 
clause implicates of subformulas of “medium” size have “many” variables. 

Lemma 4 There exists d > 0 and r ]2 > 0 such that w.h.p. (when <I> is a random 
instance o/SAT(C) having constraint density c) every clause C present in a 
refutation of CZ(4>) that satisfies ^ < p,{C) < dn also satisfies ICI > 77271 . 

Proof. 

Given a clause C, let S be a subformula of $, having minimal size, such that 
E\= C. We claim: 

Lemma 5 For every constraint P of'E. that contains k — 2 private variables, at 
least one of these variables appears in C. 

Proof. Suppose there exists a constraint D of E with at least k — 2 private 
variables such that none of its private variables appears in C. Because of the 
minimality of S there exists an assignment F that satisfies S \ {11} but does not 
satisfy D or C. Since D has no implicates of size two, there exists an assignment 
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G, that differs from F only on the private variables of D, that satishes S. But 
since C does not contain any of the private variables of D, F coincides with G 
on variables in C. The conclusion is that G does not satisfy C, contradicting the 
fact that S ^ C. □ 


Now dehne x{-, •) to be the function from Eq. (01 that describes the dependence 
of a; on 2 / and c. For a constant e > 0 to be determined later, dehne 

d = min{inf{x{2/{2k - 3 + e),c)|c > csAT(C)}>m)- 

Since SAT(C) has a sharp threshold, the hrst term of the minimum expression 
is, like 771, strictly greater than zero. Therefore, d > 0. 

Lemma 6 There exists constant r ]2 > 0 such that w.h.p., when <i> is a random 
instance o/SAT(C) having constraint density c and W C $ /s a formula with at 
most dn constraints, W contains at least ri 2 n constraints each of which has at 
least k — 2 private variables. 

Proof. 

To prove Lemma0we hrst need; 

Lemma 7 Let e > 0 be a constant. If F is a formula with c*{F) < 2 k- 3 +e 
for every subformula G of F, at least (e/3)|G| constraints of G have at least 
k — 2 private variables. 

Proof. Indeed, since c*(G) < 2 k- 3 +e ^ argument similar to the one used 
in the proof of Lemma0 'Ui(G) > (fc — 3 + e)|G|. Since constraints in G 
have arity k, at least (e/3)|G| have more than fc — 3 (i.e., at least fc — 2) private 
variables. □ 


Returning to the proof of Lemma0 choose y = 2 k- 3 +e Lemma0for e > 0 
a small enough constant. Because of the dehnition of d, when $ is a random 
instance of SAT(C) having constraint density c, w.h.p. formula T* is (d, y) sparse. 
Since |VE| < dn, this easily implies the fact that 


c*{W) < 


2 

2k — 3 -f c 


Lemma0follows by applying Lemma0to formula W with 772 = e/3. Applying 
this result and Lemma0to formula S also concludes the proof of Lemma0 

□ 
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The proof of item 2. of Theorem|3now follows: since for any clause K in 
we have /i(iT) = 1, since /r(n) > pin and since 0 < d < pi, there indeed exists 
a clause C such that 

/i(C) £ [dn/2,dn\. (11) 

Indeed, let C' be a clause in the resolution refutation of $ minimal with the 
property that > dn. Then at least one clause C of the two involved in 

deriving C' satisfies Eq. U . Applying Lemmal^we infer that the width of C 
is at least p 2 n. Using the size-width connection fromj^ completes the proof of 
item 2. ofTheorem|3 


□ 


4.1 Threshold location and discontinuous spines 

Molloy 121 has studied threshold properties of random constraint satisfaction prob¬ 
lems, describing a technical property of the constraint set (called very well-behavedness) 
that is necessary for the existence of a sharp threshold. In Ga we have shown that 
Molloy’s well-behavedness condition is actually necessary and sufficient for boolean 
constraints (this has been independently proved by Creignou and Daude 1281 1. Thus 
we have completely characterized sets C for which SAT(C) has a sharp threshold. 

The well-behavedness condition has implication for the clause/variable ratio of 
minimally unsatisfiable formulas: it has to be larger than l/(fc — 1). Furthermore, 
Molloy has shown that if the density of minimally unsatisfiable formulas is bounded 
away from l/(fc — 1) (i.e., it satisfies the conditions of Theorem|2jl then the location of 
the transition is strictly larger than 

We have seen that the same density condition is sufficient to guarantee the disconti¬ 
nuity of the spine and exponential resolution complexity. A natural question therefore 
arises: is it possible to relate the continuity (or discontinuity) of the spine to the loca¬ 
tion of the phase transition ? 

At first this does not seem to be possible: we have already encountered two prob¬ 
lems that fail to satisfy the sufficient condition for a discontinuous spine, random 2- 
SAT, for which the transition has been proven to be of second order (7), and random 
1 -in-fc-S AT, for which a similar result holds ED. Both have a threshold location strictly 
higher than Molloy’s lower bound of However, the most natural specification 

of the random model for the two problems involves applying constraints on both vari¬ 
ables and their negations. For both problems the actual location of the threshold is 
twice the value given by Theorem 3 in 113, at clause/variable ratio This sug¬ 

gests that the following tempting intuitive picture might be accurate, at least in a more 
restricted setting: 

1. Problems with a continuous spine are “2-S AT-like”, and have a phase transition 

at constraint density ■ 

2. Problems with a discontinuous spine have a phase transition located at constraint 
density c > 
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To obtain results that partly support the intuition above, we have to modify the 
random model from Definition|2]to allow negated variables. 

Definition 10 Let C be a set of constraint templates. The closure of C, denoted C is the 
set of constraints 

C = .. ,Xff) I C e C and ei,... ,efe £ {±1}}, (12) 

where for a variable Xi we define xj := Xi, x~^ := of. 

Set C is good if\C\ = \C\2^, that is all elements on the right hand side of Eq. MIX 
are distinct. 

Definition 11 Let C be a good set of constraint templates. Denote by (C) the 

version o/SAT(C) that generates a random formula by the following process: 

1. Select, uniformly at random and with replacement, m hyperedges of the complete 
k-unifonn hypergraph on n variables. 

2. For each hyperedge e, choose a random ordering o^ of the variables involved in 
it. 

3. Independently with probability 1/2 negate each variable appearing in Og. 

4. Choose a random constraint template from C and apply it to the ordered list of 
literals in Og. 

It is easy to see that problems such as fc-SAT and l-in-A:-SAT can be expressed 
using the framework of Definition [H] The following result shows that the intuition 
connecting the discontinuity of the spine, resolution complexity and the location of the 
phase transition does indeed have merit: a strenghtening of the condition guarantee¬ 
ing the existence of a discontinuous spine and exponential resolution complexity also 
implies that the satisfiability threshold is located at a value higher than c“"‘: 

Theorem 4 Let C be a good set such that 

1 . SAt1”®®^(C) has a sharp threshold (the result in EUS can be easily adapted to 
completely characterize such sets C). 

2. There exists e > 0 such that, for every minimally unsatisfiable formula F whose 
constraints are drawn from template set C, the ratio of the number of constraints 
in F to the number of distinct literals (variables and negated variables) appear¬ 
ing in F is at least 

Then 

1. There is a constant (5 > 0 such that random instances o/SAT^"®®^ (C) with 

m = cn, where c < + S), are satisfiable with probability 1 — o(l). 

2. Problem SAT^"®®^ (C) has a discontinuous spine and exponential resolution com¬ 
plexity. 
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Proof. 


1. Since (C) has a sharp threshold, it is sufficient to show that there exists a 

fixed constant rj > 0 such that the probability that a random formula is satisfiable 
is at least rj. 

Suppose m = cn with c = (1 + S), with 5 > 0 small enough. Define 

random model (C) that is a variant of SAT^“®^ (C) as follows: 

(a) Choose a random fc-uniform hypergraph H with m edges on the vertex set 
(of cardinality 2 n) consisting of variables and negated variables. 

(b) For every edge e € H create a random permutation Oe of its elements. 

(c) Apply a random constraint template in C to variables in the ordered list Og. 

This model differs from the random model SAT*'''®®^(C) in that it allows for 
constraints that include a fc-tuple of literals involving two opposite literals. 

Define W to be the event that formula <1) contains some clause involving two 
opposite literals. It is easy to see that the expected number of such clauses in a 
random formula $ is constant. Therefore, with positive probability A > 0 in a 
random formula generated according to SAT 2 ''®®^(C), the bad event W will not 
happen. 

Let Z denote the event that a random formula with m = cn clauses generated 
according to the random model SAT^”**®^ (C) is satisfiable. 

Then the probability that a random formula in SAT^'^®®^ (C) is satisfiable is equal 
to Pr[Z\W]. To show that this is bounded away from zero it is enough to prove 
that Pr[Z] = 1 — o(l). 

The fc-uniform hypergraph on the 2n nodes (variables and their negations) corre¬ 
sponding to choosing a random instance of SAT 2 '^*^®^(C) is a random fc-uniform 
hypergraph. Thus we want to show that a formula generated by first choosing 
such a random fc-uniform hypergraph H, and then applying a random constraint 
template from C on the given literals is w.h.p. satisfiable. 

The proof of this is entirely similar to a step in the proof of Molloy’s Theorem 3 
in El, and amounts to showing that w.h.p. the hypergraph H does not contain 
any hypergraph of high density, corresponding to the fact that minimally unsatis- 
fiable subformulas have clause/variable density at least + e). Rather than 

repeating an argument that is presented in detail in that paper, we refer the reader 

to El- 

2. Since C is good, one can simply apply Theorem|2lto SAT(C), which is equivalent 
to problem SAT(“®)(C). 


□ 
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5 Beyond random satisfiability: comparing the behav¬ 
ior of the backbone and spine 

In this section we investigate empirically the continuity of the backbone for two graph 
problems, random three coloring (3-COL) and the graph bipartition problem (GBP). 
Both can be phrased as decision or as optimization problems, in the same manner as 
fc-SAT and MAX-fc-SAT. 

We consider a large number of instances of random graphs, of sizes up to n = 1024 
and over a range of mean degree values near the threshold. For each instance we 
determine the backbone fraction /. 

Culberson and Gent m have shown experimentally that the 3-COL spine frac¬ 
tion /sp, as defined in Definition|3 exhibits a discontinuous transition. To be consis¬ 
tent with this study, we use the backbone fraction /bc from the same definition. We 
employ a rapid heuristic called extremal optimization EOI- Although an incomplete 
procedure, numerical studies m as well as testbed comparisons with an exact algo¬ 
rithm (231, have shown that extremal optimization yields an excellent approximation 
of fsc around the critical region (see EOl for further discussions, that, we believe, 
convincingly support this assertion). Fig. 1 shows Jbc as a function of mean degree. 

Culberson and Gent have speculated that at the 3-COL threshold, although their 
spine is discontinuous, the backbone might be continuous. The results in Fig. la sug¬ 
gest otherwise. For 3-COL, fsc does not appear to vanish above the threshold, indi¬ 
cating a discontinuous large n backbone Eli 

We next study the graph bipartion problem (GBP): 

Definition 12 GBP is the following decision problem. Given a (not necessarily con¬ 
nected) graph G with n vertices, n being an even number, determine whether it can be 
partitioned into two edge-disjoint sets having n/2 vertices each. 

This problem cannot, strictly speaking, be cast in the setup of random constraint satis¬ 
faction problems from Definition|2] since not every partition of vertices of G is allowed. 
It can be cast to a satisfiability problem (with variables associated to nodes, values as¬ 
sociated to each partition and constraint “x = y” associated to the edge between the 
corresponding vertices) but we must add the additional requirement that all satisfying 
assignments contain an equal number of ones and zeros. Thus the complexity-theoretic 
observations of Section 0 do not automatically apply to it. We can, however, give a 
“DPLL-like” class of algorithms for GBP, so the the hope of obtaining results similar 
to the previous ones is not so far-fetched. 

Let us investigate the continuity of the backbone/spine under the model in Defini¬ 
tion 0 It is easy to see that the constraint-based spine Sc{G) of a GBP instance G 
contains all edges belonging to a connected component of size larger than n/2. Since 
the GBP threshold takes place where the giant component becomes larger than n/2, 
fsc is discontinuous there. On the other hand, the backbone fraction Jbc (Fig- lb) 
appears to remain continuous, vanishing at large n on both sides of the threshold. 

We have noted earlier that the discontinuity of is a stronger property than the 
discontinuity of /b. Thus for 3-COL it follows that the variable-based backbone is 
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discontinuous as well. By contrast, it is not clear for GBP whether the variable-based 
backbone is continuous: our preliminary experimental evidence is as yet inconclusive. 

The results in Fig. lb suggest that the spine and the backbone can behave dif¬ 
ferently at the threshold, though they do not yet address the question of whether the 
spine’s discontinuity really has computational implications for the decision problem’s 
complexity. After all, unlike 3-COL, GBP can easily be solved in polynomial time by 
dynamic programming. This situation is similar to that of XOR-SAT, where a poly¬ 
nomial algorithm exists but the complexity of resolution proofs/DPLL algorithms is 
exponential. The class of “DPLL-like” algorithms that can solve GBP can no longer 
be simulated in a straightforward manner by resolution proofs, however it can be sim¬ 
ulated using proof systems Res{k) that are extensions of resolution 1241 . Some of the 
hardness results for resolution extend to these more powerful proof systems, and in 
l26l we investigate the extent to which our present results apply to this class of proof 
systems. These preliminary results imply that, indeed, the discontinuity of the spine 
does have computational implications for GBP. 

6 Discussion 

We have shown that the existence of a discontinuous spine in a random satisfiability 
problem is often correlated with a peak in the complexity of resolution/DPLL 
algorithms at the transition point. The underlying reason is that the two phenomena 
(the jump in the order parameter and the resolution complexity lower bound) have 
common causes. 

The example of random fc-XOR-SAT shows that a general connection between a 
first-order phase transition and the complexity of the underlying decision problems is 
hopeless: Ricci-Tersenghi et al. |3| have presented a non-rigorous argument using the 
replica method that shows that this problem has a first-order phase transition, and the 
following weaker result is a direct consequence of Theorem|3 

Proposition 2 Random k-XOR-SAT, k > 3, has a discontinuous spine. 

However, our results, as well as work in progress mentioned above, suggest that the 
continuity/discontinuity of the spine is a predictor for the complexity of the restricted 
classes of decision algorithms that can be simulated by “resolution-like” proof systems. 
Furthermore, experimental evidence in the previous section suggests that the backbone 
and the spine do not always behave similarly. Our analysis indicates that the spine, 
rather than the backbone, is the order parameter to consider in studying the complexity 
of combinatorial problems. 
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(a) 3-COL ED 


GBP (b) 



(b) GBP. 


Figure 1: Plot of the estimated constraint-based backbone fraction Jbc on random 
graphs, as a function of mean degree c. For 3-COL, the systematic error based on 
benchmark comparisons with random graphs is negligible compared to the statistical 
error bars; for GBP, fsc is found by exact enumeration. The thresholds c « 4.70 for 
3-COL and c = 2 In 2 for GBP are shown by dashed lines. 
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